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Abstract 

Background: The dried root of Polygala tenuifolia, named Radix Polygalae, is a well-l<nown traditional Cliinese medicine. 
Triterpenoid saponins are some of the most important components of Radix Polygalae extracts and are widely studied 
because of their valuable pharmacological properties. However, the relationship between gene expression and triterpenoid 
saponin biosynthesis in P. tenuifolia is unclear. 

Methodology/Findings: In this study, ultra-performance liquid chromatography (UPLC) coupled with quadrupole time-of- 
flight mass spectrometry (Q-TOF l\/lS)-based metabolomic analysis was performed to identify and quantify the different 
chemical constituents of the roots, stems, leaves, and seeds of P. tenuifolia. A total of 22 marker compounds (VIP>1) were 
explored, and significant differences in all 7 triterpenoid saponins among the different tissues were found. We also observed 
an efficient reference gene GAPDH for different tissues in this plant and determined the expression level of some genes in 
the triterpenoid saponin biosynthetic pathway. Results showed that MVA pathway has more important functions in the 
triterpenoid saponin biosynthesis of P. tenuifolia. The expression levels of squalene synthase (SQS), squalene 
monooxygenase (SQE), and beta-amyrin synthase (P-AS) were highly correlated with the peak area intensity of triterpenoid 
saponins compared with data from UPLC/Q-TOF MS-based metabolomic analysis. 

Conclusions/Significance: l\\\s finding suggested that a combination of UPLC/Q-TOF MS-based metabolomics and gene 
expression analysis can effectively elucidate the mechanism of triterpenoid saponin biosynthesis and can provide useful 
information on gene discovery. These findings can serve as a reference for using the overexpression of genes encoding for 
SQS, SQE, and/or p-AS to increase the triterpenoid saponin production of P. tenuifolia. 
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Introduction 

Radix Polygalae, which is the root oi Polygala tenuifolia Willd. 
or P. sihirica L. (Fam. Polygalaceae) as listed in Chinese 
Pharmacopoeia (2010 edition), has been used in traditional 
medicine for tonic, sedative, antipsychotic, and expectorant 
purposes for centuries in China [1,2]. The studies on the chemical 
components of this herb have mainly focused on various saponins, 
xanthones, and oligosaccharide esters [3-6]. Among them, 
triterpenoid saponins are documented as an important group of 
active components. The structures of the triterpenoid saponins in 
P. tenuifolia are rather complex [7]. The triterpenoid saponins 



have a triterpene presenegenin aglycone as basic structure, and the 
slight differences among these saponins are a result of various 
substitutes including saccharides and/or acyl groups linked to C- 
28. Considering the complicated isolation and complete structural 
characterization of triterpenoid saponins, more methods need to 
be estabhshed to analyze the chemical constituents of triterpenoid 
saponins in P. tenuifolia. In recent years, UPLC coupled with 
electrospray ionization quadrupole time-of-flight mass spectrom- 
etry (Q;TOF MS) has been widely used as a powerful tool in the 
analysis of complex saponin metabolites because this tool shortens 
the analysis time and provides accurate mass measurements [8]. 



PLOS ONE I www.plosone.org 



1 



August 2014 | Volume 9 | Issue 8 | el 05765 



Polygala tenuifolia qRT-PCR Triterpenoid Saponins Enzyme Genes 



Our report, which is an investigation of constituents in P. 
tenuifolia by ultra-performance liquid chromatography (UPLC) / 
Q;TOF MS, aims to analyze the peak area intensity differences of 
triterpenoid saponins among roots, stems, leaves, and seeds, and 
quantify their metabolites using metabolomic technology. 

Modern pharmacological studies ha\ i' already demonstrated 
that these triterpenoid saponins exhibited a wide variety of 
activities, such as cognitive improving, anti-angiogenic, hypolip- 
idemic, anti-inflammatory, anti-depressive, anxiolytic, sedative- 
hypnotic, and neuroprotective actions [2,9-16]. Although various 
chemical and pharmacological properties of triterpenoid saponins 
have been extensively studied, there is no study that has focused on 
the biosynthetic pathway of the triterpenoid saponins in Radix 
Polygalae. The method of terpenoid biosynthesis in plants is 
through the isoprenoid pathway by cyclization of 2, 3-oxidosqua- 
lene to primarily give cucurbitane skeleton. Two biosynthetic 
pathways are possible for presenegenin skeleton: the mevalonate 
(MVA) in cytosolic pathway and the non-mevalonate or 2-C- 
methyl-D-erythritol 4-phosphate/l-deoxy-D-xylulose 5-phosphate 
(MEP/DOXP) in plastidial padiway [17]. The MEP pathway is 
absent in higher animals and fungi, but in green plants, the MEP 
and MVA pathways co-exist in separate cellular compartments 
[18]. The triterpenoid saponin skeleton may undergo various 
oxidations, substitution, and glycosylation mediated by cyto- 
chrome P450-dependent monooxygenases (CYP 450s), glycosyl- 
transferases (UGTs), and other enzymes [19]. To date, only a few 
publications that report the identification of CYP 450s and UGTs 
involved in biosynthesis of triterpenoid saponins are available. 
Characterized CYP450s and UGTs, such as CYP 93E1, CYP 
88D6, CYP ,51H10, CYP 72A154, CYP 716A12, UGT 73P2, 
UGT 74M1, UGT 73K1, and UGT 73F3, have been reported to 
be involved in modification of triterpenoid saponins [20]. In 
nucleotide database of NCBI from P. tenuifolia (up to 2014.02), 
only three unigenes and two complete cDNAs had been identified 
and whether enzymes lia\c an important function in the 
biosynthesis pathway of triterpenoid saponins is stiU unknown. If 
we continue to figure out the whole biosynthesis pathway of 
triterpenoid saponins, many enzyme genes in triterpenoid saponin 
biosynthesis are needed to be screened by the gene expression of 
these enzym(;s for furth(;r analysis. 

Gene expression analysis has been frequentiy used to analyze 
mRNA in different organisms, tissues, and in several transgenic 
and gene mutation experiments [21-24]. Individual RNAs can be 
measured by Northern blot [25], RNase protection analysis [26], 
and in situ hybridization [27]. However, these techniques are 
time-consuming and/ or require large quantities of RNA for 
detection [28]. Quantitative real-time polymerase chain reaction 
(qRT-PCR) has a broad range of detection with advantages in 
repeatability, sensitivity, and specificity. It is also valuable and easy 
to use when studying rare transcripts and detecting low abundance 
mRNAs [29] . Given the substantial \'ariations in RNA's stability, 
quantity, and purity, and the efficiency of reverse transcription and 
PGR, the reliability of qRT-PCR results is highly dependent on 
the reference genes chosen. However, many of these reference 
genes may not be stably expressed under all conditions [30-32]. 
Some reference genes have been reliably validated in one plant 
species but not well suited in others [33]. Therefore, experimental 
proof is suggested when reference genes are to be used under a 
new experimental condition or in a new plant species [34]. 
Moreover, given the limitations of Sanger sequencing [36-37], 
expressed sequence tag (EST) analysis has identified only five 
mRNAs, which are all involved in biosynthesis of the saponins. 
The next-generation sequencing technology, known as RNA-seq, 
emerged as an effective approach for high-throughput sequence. 



now facilitates studies of transcriptome in a rapid way and has 
been used to explore gene structure and expression profiling on 
traditional Chinese medicine [38-39]. In our previous research, 
the transcriptome of Radix Polygalae was evaluated by Illumina 
HiSeq 2000 (data not shown), which provides a basis for further 
research about gene expression analysis of P. Tenuifolia. 

In the present study, the differences among chemical compo- 
nents of triterpenoid saponins in different tissues from the roots, 
stems, leaves, and seeds of P. tenuifolia were identified by UPLC/ 
Q;TOF MS. qRT-PCR was further performed to screen reference 
genes (Table SI) and measure the expression of nine rel(;\ant 
genes (Table S2) in triterpenoid saponin backbone biosynthesis 
pathway (from acetyl-CoA or pyruvate to sapogenin) of P. 
tenuifolia. The expression levels of three CYP 450s and three 
UGTs genes (Table S3) were also analyzed. Those six genes were 
all from the CYP 450s and UGTs families which were identified to 
be involved in biosynthesis of triterpenoid saponins and were also 
found in the transcriptome of Radix Polygalae. This study aimed 
to discover the differential gene expression in different tissues, and 
explore the excellent genetic traits in the triterpenoid saponin 
biosynthesis pathway. The results may provide useful information 
for deeper understanding of the biosynthesis pathway of triterpe- 
noid saponins in P. tenuifolia. 

Results 

Metabolite analysis of P. tenuifolia by UPLC/Q-TOF MS 

The UPLC system provides a rapid, effective, and convenient 
method to analyze the chemical constituent's variance between 
different tissues in P. tenuifolia , whereas Q,-TOF MS provides the 
accurate mass measurement. Figure 1 shows definite differences 
among the four samples from a visual examination of MS total ion 
current (TIC) chromatograms. A total of 25 compounds, including 
4 xanthone C-glycosides, 6 sucrose esters, 8 ohgosaccharide multi- 
esters, and 7 triterpene saponins, were identified. The details of 
identified compounds are summarized in Table 1 . Figure 2 shows 
the typical MS/MS spectrum of Onjisaponin TG. The observa- 
tion of [M-H-Glc-HjO-COa-Fuc-Rha-Xyl-Api-HMG-CHaO]" 
at m/z 455.3153 (Y;) and [M-H-Glc-HaO-COa-Fuc-Rha-Xyl- 
Api-HMG]" (Yj) at m/z 425.3045 in the MS/MS .spectrum can 
be considered as the diagnostic ions for this type of triterpene 
saponins, and the fragment ions at m/z 1235.5663, corresponding 
to the losses of 144 Da proved the occurrence of 3-hydroxy-3- 
methyl-5-pentanoic acid ester (Xj). 

Multivariate statistical analyses 

The normalized data sets from each sample column contained 
1826 variables. The resulting integral data were imported in 
SIMCA-P for multivariate analysis. PC A was used first to detect 
any inherent trends within the data as an unbiased statistical 
method. PCA score plots showed that the groups of roots, stems, 
leaves, and seeds were clearly clustered into four groups 
(Figure 3A). To further study the variance between the different 
tissues and to determine potential biomarkers, supervised PLS-DA 
was subsequently used. The PLS-DA model was also validated by 
a permutation test with 200 permutations (Figure 3B) (R2 
X = 0,929, Q2 = 0.925). The results indicated that the PLS-DA 
pattern combined with the loading plot was suitable to find out 
potential biomarkers. 

Potential biomarkers 

Based on the normalized data of UPLC/Q;TOF MS separa- 
tion, the loading plots (Figure 3C) were constructed for PLS-DA. 
The variables with VIP value > 1.000 were marked as potential 
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(D) 




Figure 1 . MS TIC chromatograms of roots (A), stems (B), leaves (C), and seeds (D) of P. tenuifolia by UPLC/Q-TOF MS in negative ion 
mode. 

doi:1 0.1 371/joumal.pone.01 05765.g001 



biomarkers. As shown in Table 2, a total of 22 marker compounds 
accountable for the different metabolite profiles of the roots, stems, 
leaves and seeds were observed. Sibiricoses Al, Sibiricoses A5, and 
Lancerin were the compounds particularly responsible for these 
variations. 

The main objective of this experiment is to focus on the changes 
of triterpenoid saponins between the different tissues. Five of the 
seven triterpenoid saponins had important functions in the 
intergroup differences. Furthermore, the peak area intensity of 
all seven triterpenoid saponins was analyzed (Figure 3D). In this 
experiment, the peak area intensity of most marker compounds 
were higher in the roots (^<0.05), whereas Lancerin and 7-0- 
Methylmangiferin were higher in the stems, leaves, and seeds 
(Table 2). 

Screening of reference genes of P. tenuifolia 

Seven candidate traditional reference genes including 18s 
ribosomal RNA (18s RNA), crystaUinum ubiquitin-conjugating 
enzyme (UBC 2), actin 1 1 (ACT 1 1), glyceraldehyde-3-phosphate 
dehydrogenase (GAPDH), alpha tubulin (TUA), elongation factor 
1-alpha (EFlot), and actin 1 (ACT 1) were selected (Table 3). 
Based on Figure 4A, the expression levels of the seven reference 
genes in different tissues varied extensively with Ct values ranging 
from 8 to 30 cycles. Most abundantly transcribed was 18s RNA 
with Ct value of less than 10 cycles. The rest of the genes were 



moderately expressed mRNAs with Ct values between 20 and 27 
cycles. 

The geNorm software was used to analyze the gene expression 
stabilities of the seven reference genes. This software calculates the 
average expression stability value (M) based on the average pair- 
wise variation between all genes tested [43]. The gene with the 
lowest M value illustrates the most stable expression, whereas the 
highest M value indicates least stable expression. Except for 
UBC2, all genes had M value below the geNorm threshold of 1.5, 
whereas ACT 11 and GAPDH had the lowest M value (0.613) 
(Figure 4B). Therefore, ACT 1 1 and GAPDH had the most stable 
expression and UBC2 had the least stable expression. 

The NormFinder software, whose criteria is the same with 
geNorm, was used to confirm results obtained by the geNorm 
program. Similar with the geNorm method, the gene with the 
lowest M value has the most stable expression, and the gene with 
the highest M value has the least stable expression. Results of 
NormFinder were showed in Figure 4C, GAPDH and ACT 1 1 
had the most stable expression in different tissues. Stable genes 
identified in geNorm were confirmed by NormFinder, and 
GAPDH was chosen as the reference gene in the different tissues 
of P. lenuifolia. 
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Figure 2. Extracted ion chromatogram (A) and MS/MS spectra (B) of Onjisaponin TG. Glc = |3-D-glucopyranosyl; Fuc = [3-D-fucopyranosyl; 
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qRT-PCR analysis for genes involved in triterpenoid 
saponin biosynthesis pathways 

To explore the mechanism of the triterpenoid saponin changes 
in the difTerent tissues and understand its biosynthesis pathway in 
P. tenuifolia, the 15 genes (Table 3) putatively expressed in 
triterpenoid saponin biosynthesis pathway were analyzed by qRT- 
PCR. As shown in Figure 5A, among the nine genes involved in 
triterpenoid saponin backbone biosynthesis, phosphomevalonate 
kinase (PMK), SQS, SQE, and (3-AS genes in the roots were 
significandy overexpressed with 2 to 13-fold higher expression 
than the other tissues (^<0.05). The expression level of 4-hydroxy- 
3-methylbut-2-enyl diphosphate reductase (HDR), (E)-4-hydroxy- 
3-methylbut-2-enyl-diphosphate synthase (HDS), and farnesyl 
diphosphate synthase (FPS) in the stems were significantly higher 
compared with that in the roots (/*<0.05). Moreover, cycloartenol 
synthase (CAS) genes exhibited the highest expression in the seeds. 

Furthermore, the correlation coefficients between the peak area 
intensity of triterpenoid saponins and expression levels were 
computed by SPSS. The expression levels of SQS, SQE, and (3-AS 
showed high correlation with the peak area intensity of triterpe- 
noid saponins in different tissues (Table 4). The transformation of 
trans,trans-farnesyl diphosphate (FPP) to (S)-2,3-oxidosqualene 
requires the function of SQS and SQE [35]. The expression of 
SQE in the roots was significantly over expressed with an average 
of six fold compared with the other tissues ip<O.Ob). The next 
important step in the pathway is through P-AS and CAS, which 
catalyzes 2,3-oxidosqualene to form triterpenoid saponins and 
sterol, respectively [36]. The expression level of CAS was higher in 
the leaves and seeds compared with the root ip<O.Ob), whereas the 
P-AS expression in the root was higher (p<0.05) and overex- 



pressed with an average of 13 fold compared with the stems, 
leaves, and seeds. 

The correlation between the gene expression level and peak 
area of compound also represents an approach to determine the 
possible CYP 450 and UGT genes involved in the triterpenoid 
saponin biosynthesis. However, no apparent regularity in the 
expression of CYP88D6, CYP716B1, CYP72A1, UGT74B1, 
UGT73B2, and UGT73C6 (Figure 5B) was observed. Results 
demonstrated that UGT73B2 mRNA levels were highest in the 
roots and the expression levels of CYP88D6, CYP716B1, 
CYP72AI, UGT74B1, and UGT73C6 were highest in leaves, 
respectively. However, the expression of these genes was 
significandy different from SQS, SQE, and P-AS. In addition, 
the correlation coefficient between the gene expression level and 
peak area intensity of triterpenoid saponins were less than 0.50 
(Table 4). 

Discussion 

The recent advances in metabolomics have provided new 
insights in the meaningful biological changes of numerous 
metabolites using liquid chromatography-mass spectrometry 
(LC-MS), gas chromatography-mass spectrometry (GC-MS), and 
nuclear magnetic resonance (NMR) technologies. In this study, 
LC-MS metabolomic profdes have clearly demonstrated the 
changes of secondary metabolites particularly the triterpenoid 
saponins in the difiFerent tissues of P. tenuifolia. Furthermore, the 
nine enzyme genes involved in the triterpenoid saponin backbone 
biosynthesis and six enzyme genes involved in the oxygenation and 
glycosylation of cyclization product (Figure 6) were analyzed. The 
understanding of the biological significance of the alteration of 
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Figure 3. PCA scores plots derived from UPLC/Q-TOF MS data for extracts obtained from four different tissues (A); Permutation test 
witKi 200 permutations of PLS-DA model (B); PLS-DA loadings plot obtained from the metabolic profiles of four different tissues 
(C); Peak area intensity of triterpenoid saponins differences between the four tissue groups (D). * The stems, leaves, and seeds group 
compared with the roots group, p<0.05. 
doi:10.1371/journal.pone.0105765.g003 



these triterpenoid saponins and genes may provide evidence for 
fijrther discovery of enzyme gene witli key role in P. tenuifolia. 

UPLC/Q-TOF MS analysis and accumulation of 
triterpenoid saponins 

Considering tlie complexity of the plant metabolite composi- 
tion, plant metabolomic analyses have many challenges. Com- 
pared with GC-MS and NMR technologies, HPLC technology 
has an advantage on the analysis of plant secondary metabolites. 
Currently, HPLC ultraviolet detector (HPLC-UV), HPLC eva- 
portive light scattering detector (HPLC-ELSD), HPLC photodiode 
array detector (HPLC-PDA), and UPLC/Q-TOF MS methods 
have been developed for the qualitative and/or quantitative 
analysis of plant secondary metabolites. Among these analytical 
methods, the UPLC coupled with Q,-TOF MS has become a 
powerful tool for metabolite profiling and identifying the 
complicated components in herbal medicine [44-45]. In previous 
studies, positive and negative ion modes were investigated. The 
result were that the detection of saponins is less difficult using 
negative ion mode and the confirmation of molecular ions or 



quasimolecular ions for the identification of each peak [42] is also 
easier. Thus, mass data monitored with negative ion mode were 
utilized for component characterization. 

Triterpenoid saponins, as a category of secondary metabolites, 
indirectly participate in plant growth and development. To protect 
the plant from herbivore and to be part of the plants' defense 
systems (anti-microbial, anti-fungicidal, and insecticidal) are the 
most important functions of triterpenoid saponins in plants [46] . 
The triterpenoid saponin accumulations are influenced by several 
environmental factors, such as seasonal variation, nutrient, light 
irradiation, or its combined effects. The distribution of triterpenoid 
saponins also demonstrates seasonal fluctuations or greatly varies 
in the individual plant organs or tissues. For instance, Lin et al. 
[47] found that peak area intensity of saponins in the tubers of 
Dioscorea pseudojaponica is the maximum protection provider for 
this reproductive organ. Saponins, whose levels were increased in 
the early stages of berry development of Phytolacca dodecandra, 
also prevent fruit loss and ensure seed maturation [48]. 

In this study, result of metabolomic analysis showed a clear 
separation in the PCA space of the different tissues of P. tenuifolia 
analyzed by UPLC/Q;TOF MS. Considerable information in 
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Figure 4. Expression levels of candidate reference genes in the different tissues by qRT-PCR (A); Average expression stability 
values (M) of the candidate reference genes calculated by geNorm (B); Gene expression stability and ranking of reference genes as 
calculated by NormFinder (C). 

doi:10.1371/journal.pone.0105765.g004 



biomarkers were obtained through accurate mass measurement 
and MS/MS analysis. The specific accumulation of triterpenoid 
saponins in die root of P. tenuifolia indicated diat it might 
counteract soU-borne fungi, which is similar with the accumulation 
of saponins in the root of Avenue strigosa [49]. Moreover, the 
different peak area intensity and distribution of triterpenoid 
saponins in different tissues provide an excellent model to research 
its biosynthesis pathway and screen enzyme gene with key role in 
its biosynthesis. 

Selection of reference genes in P. tenuifolia 

The qRT-PCR technique is one of the most common methods 
to quantify gene expression levels in order to understand biological 
processes. However, a reliable method to normalize its expression 
is necessary. Thus, an appropriate internal reference gene is 
required for reliable quantification of gene transcripts [50]. We 
examined the expression levels of seven reference genes obtained 
from transcriptome data of P. tenuifolia in different tissues. We 
also analyzed their expression using geNorm and NormFinder. 
Based on these softwares, data showed that top two positions of the 
reference genes for the different tissue samples were similar. 
Because GAPDH is a commonly used reference gene to normalize 
the gene expression data in qRT-PCR assay [51], it was also 
chosen as the reference gene in the different tissues of P. tenuifolia . 



Biosynthesis of triterpenoid saponin backbone 

Current ideas consider that the early stage of triterpenoid 
saponin biosynthesis in plants starts from the cytosolic MVA and 
plastidial MEP pathway [17]. The MVA pathway converted 
acetyl-CoA to isopentenyl diphosphate (IPP), whereas MEP 
pathway started from pyruvate to IPP and then to dimethylallyl 
diphosphate (DMAPP) (Figure 6). However, less progress has been 
made toward determining the precise source of isoprene units, and 
clarifying what the pathway is important. In this study, we 
determined the gene expression of MK and PMK of MVA 
pathway, and HDR and HDS of MEP pathway. Results showed 
that gene expression levels of MK and PMK were generally high 
in the roots, whereas HDR and HDS were higher in the stems. 
The correlation coefiicient showed that gene expression of MK 
and PMK had higher correlation with the peak area intensity of 
triterpenoid saponins than HDR and HDS (Table 4). Result 
suggested that MVA pathway has more important functions in the 
triterpenoid saponin biosynthesis of P. tenuifolia. 

Geranyl diphosphate (GPP) is obtained through the condensa- 
tion of IPP and DMAPP, and FPP, which is the common 
precursor of the vast array of sesquiterpenes, resulted from the 
addition of a second IPP unit. Two trans,trans-FPP molecules are 
facilitated by SQS to produce squalene. Squalene is oxidized by 
SQE to (S)-2,3-oxidosqualene, which is the first step leading to 
cyclizations and common precursor of phytosterols and triterpe- 
noid saponins [51]. Moreover, (5- AS facilitates (S)-2,3-oxidosqua- 
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Table 3. Primer sequences for qRT-PCR. 





Gene 


Annotation 


Sequences 




Product 
size (bp) 






Forward 


Reverse 




ACT 11 


Actin 11 Mma 


TTGGAAGGTGCTGAGGGAA 


TGCTTAGTGGCGGAACGAC 


167 


18 s RNA 


18S ribosomal RNA 


CCTGTTATTGCCTCAAACTTCC 


GTGGAGCGATTTGTCTGGTT 


140 


GAPDH 


Glyceraldehyde-3-phosphate dehydrogenase 


ACAGCAACGTGCTTCTCACC 


CCCTTCATCACCACCGACTA 


128 


TUA 


Alpha tubulin mRNA 


TTCTCCTTCCTCCATACCCTC 


ATCAACTACCAGCCTCCCACT 


189 


EFIa 


Elongation factor 1 -alpha mRNA 


CAAAGGCACGGTATGATGAAAT 


GTGGGACCCTTGTACCAGTCAA 


157 


UBC 2 


Crystallinum ubiquitin-conjugating enzyme 


CTCAACAATCTCCCGTACCCTT 


GTCATTACTCTGCGACCCAAAC 


109 


ACT 1 


ACTl mRNA 


TGTGATGGTTGGTATGGGACA 


AGTTGCTAACAATTCCGTGCT 


110 


MK 


Mevalonate kinase 


CGGTTGTTCATGGATCTACTGC 


TCAAGTCCCATATCCTTCAGC 


128 


PMK 


Phosphomevalonate kinase 


GTAGCCCTCGTGCTTCAATCT 


TTTGCTGTAGCTGCTGCCTTT 


145 


HDR 


4-hydroxy-3-methylbut-2-enyl diphosphate 
reductase 


TTCCATCCACCAACTACCAGC 


ATGCTACCCAAGAGCGACAAG 


81 


HDS 


(E)-4-hydroxy-3-methylbut-2-enyl-diphosphate 
synthase 


CAGGCATATCGTTTGCTTGTAG 


CTTCATTCTCCCATCCTCACC 


105 


FPS 


Farnesyi diphosphate synthase 


CCATACAAGGCAAATCGTCAT 


GCAATGCGCTACTCACTCCTC 


166 


SQS 


Squalene synthase 


TTCTATGCTTCACAAGGTCTCACG 


ACATCTGTAGGGACGCTTGTATCA 


144 


SQE 


Squalene monooxygenase 


TATGAAGCGACCATTGTGAAAG 


GCTCAACGGATCTATGGGTAT 


120 


CAS 


Cycloartenol synthase 


CTAATGCCCGGACTTGTAATCA 


CCAAACATGGTGCTGTGACCT 


158 


P-AS 


Beta-amyrin synthase 


TACTGTATTCCCAGAAGAGCATC 


GAATCCACTTTCTTGCTCTAACA 


202 


CYP88D6 


Cytochrome P45088D6 


CAGGAACCCAAGATTGACAGC 


GCCATAGACTCTTTGGAGCAA 


138 


CYP716B1 


Cytochrome P450716B1 


ACTTGCTGGAACTGAGTGGTGA 


CCATCTGTCCGATGTACTGCTT 


99 


CYP72A1 


Cytochrome P45072A1 


TTGCCTTGTAGACCCTGTTGT 


AGCACTGTCGCATTGTCCTTA 


125 


UGT 74B1 


UDP-glucosyl transferase 74B1 


ATGAAGCAGGCTATGAACAAGC 


CACATCAAGAGCCCAAGGAAAA 


164 


UGT73B2 


UDP-glucosyl transferase 73B2 


TCTGGGACTCCCTGATAACCT 


AGTTCCGATTCAGAGCCTTTT 


133 


UGT73C6 


UDP-glucosyl transferase 73C6 


TTTGCTTTGGGAGTTGGTGTC 


CCGTTTCATTATTGGCGTCTT 


123 



doi:l 0.1 371 /journal.pone.Ol 05765.t003 



lene backbone into a chair-chair-chair conformation in contrast 
with the chair-boat-chair conformation during CAS catalysis. 
Based on the experimental results, the gene expression levels of 
SQS, SQE, and P-AS in the root were significantly higher than in 
the other tissues. With regard to the total peak area intensity of 
triterpenoid saponins, the correlation coefficients between the 
above three gene expression levels and peak area intensity were 
higher than the others. This result indicated that SQS, SQE, and 
P-AS have important functions in the triterpenoid saponin 
biosynthesis pathway of P. tenuifolia. Mangas et al. [52] suggested 
that SQS and P-AS have regulatory function by comparing 
triterpenoid saponin peak area intensity with the expression level 
of some related genes in the cauli of Centella asiatica . Han et al. 
[53] demonstrated that squalene enhances the accumulation of 
SQE with SQS, dammarenediol II synthase (DDS), P-AS, and 
cycloartenol synthase (PNX) by RT-PCR analysis in Panax 
ginseng. Considering the present experiment result, the overex- 
pression of SQS, SQE, and P-AS genes may be increasing the 
triterpenoid saponin production in P. tenuifolia. 

Design of the cyclization product - oxygenation and 
glycosylation 

As compared with onjisaponin backbone biosynthesis, less 
information about the late stages (from sapogenin to various 
saponin) of triterpenoid saponin biosynthesis is available. Most 
common sapogenin modifications are oxygenation catalyzed by 
CYP450s protein families and glycosylation catalyzed by UGTs at 



various positions of the backbone [54-55] . Prior to this study, few 
publications report the identification of CYP450s and the 
involvement of UGTs in the biosynthesis of triterpenoid saponins 
(Table 5), and the identified genes from P. tenuifolia have not yet 
been accounted. 

In our previous research, a total of 466 and 157 gene reads are 
annotated as CYP450s and UGTs in the transcriptome of P. 
tenuifolia (data not shown), respectively. Based on the transcrip- 
tome data, six candidate genes, including CYP450 and UGT, 
which are in the same family of the identified P450s and UGTs 
involved in the biosynthesis of triterpenoid saponins, are selected 
for qRT-PCR analysis in different tissues. Seki et al [61] have 
reported that CYP88D6 expression is detected in the roots and 
stolons by RT-PCR, but no amplification is observed in the leaves 
or stems of Glycyrrhiza (licorice) plants. However, in the current 
study, the CYP88D6 expression in the leaves or stems was 
observed and did not show much difference among the different 
tissues. Carelli et al [60] have identified that CYP716AI2 is 
involved in saponin synthesis of Medicago truncatula using a 
combined genetic and biochemical approach. However, in vitro 
enzymatic activity assays indicate that CYP716A12 catalyzes the 
oxidation of P-amyrin and erythrodiol at the C-28 position, 
generating oleanolic acid. Nevertheless, the gene expression of 
CYP716B1, from the same family of CYP716A12, was detected m 
the roots, stems, leaves, and seeds of P. tenuifolia and showed no 
correlation with triterpenoid saponin peak area intensity in the 
present research. Moreover, Seki et al [61] demonstrated that 
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Figure 5. Expression pattern of genes involved in triterpenoid saponin bacltbone biosyntliesis patKiway in tKie different tissues of P. 
Tenuifolia by qRT-PCR (A); QRT-PCR analysis of CYP450S and UGTs in the different tissues of P. Tenuifolia (B). * The stems, leaves, and 
seeds group compared with the roots group, p<0.05. 
doi:10.1371/journal.pone.0105765.g005 



CYP72A154 expressed in an engineered yeast strain, which 
endogenously produced 1 1-oxo-b-amyrin (a possible biosynthetic 
intermediate between b-amyrin and glycyrrhizin) and catalyzed 3 
sequential oxidation steps at C-30 of 1 1-oxo-b-amyrin, may supply 
in situ to produce glycyrrhetinic acid. This research also showed 
that CYP72A154 was expressed in the roots, stolons, and stems, 
whereas no transcripts were observed in the leaves of Glycyrrhiza 
plants [61]. However, in this study, the transcripts of GYP72A1 
were not only detected in the roots, seeds, and stems, but also in 
the leaves of P. tenuifolia. 

Regarding CYP450s genes, some UGT genes have also been 
cloned and identified. Research showed that UGT74M1 was 
identified by screening UGT sequences among ESTs of Saponaria 
vaccaria and catalyzed linkage of glucosyl-moieties to the C28- 
carboxy group of preferably oleanane-type sapogenins through an 



ester bond [63]. Furthermore, UGT73F3 was identified in 
Medicago truncatula and catalyzed glucosylation of hederagenin 
and other oleanane sapogenins at the G28-carboxy group [59]. In 
addition, the possible involvement in the glycosylation of 
brassinosteroids of other characterized UGTs of this family such 
as UGT73C5, UGT73C6, UGT73C8, UGT73F1, and 
UGT73P1 have also been reported [65]. The majority of UGTs 
involved in glycosylation belong to the UGT73 and UGT74 
families. In the current study, the gene expression of UGT74B1, 
UGT73B2, and UGT73C6 did not show much correlation with 
the peak area intensity of triterpenoid saponins, which is similar 
with three CYP450 genes in P. tenuifolia. 
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Figure 6. Putative triterpenoid saponin biosyntKiesis patKiway in P. tenuifolia. 

doi:1 0.1 371/journal.pone.01 05765.g006 



Conclusions 

In summary, significant differences in all 7 triterpenoid saponins 
among the different tissues were found. GAPDH was chosen as the 
reference gene in the different tissues of P. tenuifolia. Results from 
this study suggested that MVA pathway has more important 
fimctions and SQS, SQE, and P-AS are the enzyme genes with 
key role in the triterpenoid saponin biosynthesis of P. tenuifolia. 
This result was established through the correlation analysis 
between the triterpenoid saponin peak area intensity and 



expression levels of 1 5 relevant genes involved in the biosynthesis. 
The metabolomic analysis by UPLC/Q;TOF MS combined with 
the gene expression analysis by qRT-PCR technique provide 
useful information in studying gene discovery and an effective 
approach in understanding the mechanism of onjisaponin 
biosynthesis. Along with the technology advances and its broader 
availability, the digital gene expression profile (DEG)-based 
generation sequencing or DNA microarray-based gene expression 
cluster analysis may also facilitate gene discovery. 



PLOS ONE I www.plosone.org 



12 



August 2014 | Volume 9 | Issue 8 | el 05765 



Polygala tenuifolia qRT-PCR Triterpenoid Saponins Enzyme Genes 



Table 5. CYP450s and UGTs involved in triterpenoid saponin biosynthesis. 





Nsme 




Gene function 


Reference(5) 


CYP93E1 


Glycine max (L.) Merr 


Catalyze the hydroxylation of p-amyrin and its 22-hydroxyderivate 
sophoradiol at position C24 


Shibuya et al. (2006) [56] 


CYP51H10 


Avena strigosa Schreb. 


Not biochemically characterized 


Qi et al. (2006) [57] 


CYP88D6 


Glycyrrhiza uralensis Fisch. 


Catalyze C11 oxygenation of p-amyrin 


Seki et al. (2008} [58] 


CYP93E3 


Glycyrrhiza uralensi Fisch. 


Catalyze 24-hydroxylation of p-amyrin 


Seki et al. (2008} [58] 


CYP93E2 


Medicago truncatula L. 


Not biochemically characterized 


Naoumkina et al. (2010) [59] 


CYP716A12 


Medicago truncatula L. 


Catalyze C28 oxygenation of p-amyrin 


Carell et al. (2011) [60] 


CYP72A154 


Glycyrrhiza uralensis Fisch. 


Catalyze C30 oxygenation of p-amyrin 


Seki et al. (2011} [61] 


UGT73P2 


Glycine max (L) Merr 


Extend the C3 saccharide moiety of the monoglycosylated oleanane 
aglycone soyasapogenol B with a galactosyl 


Shibuya et al. (2010) [62] 


UGT91H4 


Glycine max (L.) Merr 


Catalyze the third glycosylation step in soyasaponin 1 biosynthesis 


Shibuya et al. (2010) [62] 


UGT73F3 


Medicago truncatula L. 


Catalyze glucosylation of hederagenin and other oleanane 
sapogenins at the C28-carboxy group 


Naoumkina et al. {2010} [59] 


UGT74M1 


Saponaria vaccaria 


Catalyze linkage of glucosyl-moieties to the C28-carboxy group of 
preferably oleanane type sapogenins through an ester bond 


Meesapyodsuk et al. (2007} [63] 


UGT71G1 


Medicago truncatula L. 


Position specificity not elucidated 


Achnine et al. (2005) [64] 


UGT73K1 


Medicago truncatula L. 


Position specificity not elucidated 


Achnine et al. (2005) [64] 



doi:1 0.1 371 /journal.pone.Ol 05765.t005 



Materials and Methods 

Materials and reagents 

The fresh roots, stems, leaves, and seeds from the samples of P. 
tenuifolia, were collected in Xiongshan forest farm, mountain of 
Wulong, Bayi town, Changzhi county, Shanxi province (latitude: 
N35°57'59.98"; longitude: El 13°0r36.58"). No specific permits 
were required from the forest farm to select samples. The forest 
farm is not privately-owned and the field studies did not involve 
protected species. Each sample was divided into two parts, in 
which one part was used for UPLC/Q;TOF MS analysis. The 
other parts were collected in RNase-free tubes and stored frozen in 
liquid nitrogen at — 80°C. 

AcetonitrUe of UPLC grade were obtained from Fisher 
Scientific Co., Ltd. (Waltham, MA, USA). Formic acid and 
analytic-grade methanol were purchased from Fisher Scientific 
Co., Ltd. (Waltham, MA, USA). RNAiso Plus kit, PrimerScript 
RT master mix perfect real-time kit, and SYBR Premix Ex Taq II 
kit were purchased from TAKARA Co., Ltd. (Dalian, China). 
Primers were synthesized from Sangon Biotech (Shanghai, China). 

Sample preparation for UPLC/Q-TOF MS analysis 

Roots, stems, leaves, and seeds of 9 individual P. tenuifolia 
plants were sequentially pulverized in a mortar with hquid 
nitrogen. A total of 1 g drug powder of each sample was extracted 
for 30 min by ultrasonication in 1 0 mL of methanol, and then the 
sample solution was filtered through a 0.22 JXm filter membrane 
before use. A 2 |aL ahquot was used for UPLC/Q;TOF MS 
analysis. 

UPLC/Q-TOF MS analysis 

P. tenuifolia metabolite profiling was performed on a Waters 
ACQUITY UPLC I-Class/Xevo in line with a Waters Xevo G2 
Q;TOF mass spectrometer (Waters Corporation, Milford, MA, 
USA). Samples were separated on a Zorbax Eclipse Plus Acquity 
UPLC BEH C 18 (1.7 ^lm particle size) 2.1 mmx,50 mm. The 
column temperature was maintained at 40°C. The flow rate was 
0.50 niL/min. The mobile phase consisted of acetonitrUe (A) and 



water containing 0.1% (v/v) formic acid. The gradient was 
initiated with increasing to 5% A within 1 min, 5% A and then 
linearly increased to 12% A within 2 min, then 12% A increased 
to 20% A within 3 min, 20% A increased to 30% A within 4 min, 
and 30% A increased to 35% A within 0.5 min. Next, 35% A 
increased to 45% A within 3 min and 45% A was increased to 
50% A within 1.5 min. Finally, 50% A increased to 95% A within 
3 min and 95% A increased to 100% A within 1 min. 

The Xevo G2-S Q;TOF mass spectrometer was run in negative 
mode. The data were collected for each test sample from 50 Da to 
1500 Da. High purity nitrogen (Nj) was used as nebuhzing gas, 
and ultra-high pure helium (He) as collision gas. Source 
parameters were as follows: capillary voltage, 2.50 kV; sampling 
cone voltage, 40.0 V; source offset, 100 V; desolvation tempera- 
ture, 500°C; cone gas flow, 50 L«h and desolvation gas flow, 
800 L'h"'. To ensure mass accuracy and reproducibility, leucine- 
enkephalin was used as the reference lock mass (m/z 554.2615) 
with the LockSpray interface. 

UPLC/Q-TOF MS data acquisition and analysis 

Waters' metabolomics package, namely TransOmics, developed 
with Nonlinear Dynamics, can process large sample sets, 
automatically detect features, and perform a quantitative com- 
parison with retention time (Rp) and m/ z data pairs as identifiers. 
The result of roots, stems, leaves and seeds samples were imported 
to SIMCA-P13.0 software package (version 13.0, Umetrics, Umea, 
Sweden) for multivariate statistical analysis. 

Principal component analysis (PCA) and partialleast-squares- 
discriminant analysis (PLS-DA) were run to separate between 
different tissue groups. Pareto scaling was used in all the models to 
avoid chemical noise. To evaluate the quality of the model, R2 
and Q_2 values were also calculated. R2 displays the variance 
explained in the model and indicates the goodness of fit. Q2 
displays the variance in data predicted by the model and indicates 
predictability. Potential biomarkers were selected according to the 
loading plot and Variable Importance in the Project (VIP) values. 
Compound identification of metabolites was performed based on 
their retention times, exact mass information and fragment ions 
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according to previous studies [5,7,8,66,67]. Ttie peak area 
intensity analysis of potential biomarkers and triterpenoid saponins 
were also run to find the variance between different tissue groups. 

Total RNA extraction and cDNA synthesis. Total RNA 
was extracted from the different tissues by using liNAiso Plus kit 
according to the manufacturer's protocol. RNA concentration and 
purity were estimated from the optical density at Ajeo/ Ageo ratio 
using the NanoDrop 2000 Spectrophotometer (Thermo Fisher 
Scientific, Wilmington, DE). The integrity of RNA was evaluated 
by electrophoresis. cDNA was amplified from the total RNA 
(500 ng) by using a PrimerScript RT master mix perfect real-time 
kit. 

qRT-PCR analysis 

qRT-PCR was performed using the ABI StepOne/StepOne 
Plus (Applied Biosystems, USA) with a final volume of 20 [iL 
containing cDNA, SYBR Green PGR master mix, ROX and 
primers. All qRT-PCR reactions were carried out in technical and 
biological triplicate. The amplification procedure included three 
stages: a holding stage for 10 min at 95°C; a cycling stage with 40 
cycles consisting of 15 s at 95°C for denaturation and 60 s at 60°G 
for primer annealing; and a melt-curve stage consisting of 15 s at 
95°C, 1 min at 60"G and 15 s at 95 "G [68]. 

Melt-curve analysis was performed to confirm amplicon 
specificity. AU the primers for the 7 reference genes and 15 
putative genes involved in triterpenoid saponin biosynthesis used 
for amplification were designed using Primer 5 software, using the 
following criteria: melting temperatures of 59°G to 61°C, primer 
lengths of 21 bp to 25 bp and amplicon lengths of 80 bp to 202 bp 
(Table I). The gene used as an internal control was chosen by th(" 
mean threshold cycle (Ct) values. Data were then presented 
according to the 2 ^^^^ method [69]. Measures were obtained 
for each condition from cDNA, which had been synthesized from 
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